Kobayashi Maru.
To merely mention the name Kobayashi Maru1 invites debate among Trekkies, the devoted followers of all things Star Trek . It is a test—a computer simulation. Participants take the test to evaluate their leadership skills by virtually commanding a spaceship traveling across the galaxy. The Kobayashi Maru test uncovers hidden weaknesses and unforeseen strengths—a practice not unlike user testing.
Now, it is your turn. You sit in the captain’s chair. Shortly after the test begins, you receive a distress call from the Kobayashi Maru, another spaceship, which sits damaged and unmovable across a contested border. The ship’s crew cries out for your help . To rescue the Kobayashi Maru, you must cross the border. Yet, to do so could cause a war and lead to your own destruction.
Do you try to sneak across the border? Do you fight? Do you run? Whatever choice you make, whatever action you take, you will fail. Failing is certain, for this is how the test is designed.
The star of Star Trek was Captain James T. Kirk. If you are familiar with the story,2 you know that when he faced the Kobayashi Maru, he failed it, too. But on his third attempt, he was successful. How did he win in a no-win scenario? He cheated. He reprogrammed the computer simulation to turn a no-win scenario into a no-lose . Kirk discovered the moments of failure within the simulation and changed them into moments of success. His actions demonstrate a vital lesson about testing: sometimes you have to lose to learn how to win.
For those who are new to user testing, the concept can sound frightening and dramatic. User testing exposes all your hard work to the whims and opinions of strangers. “What if they don’t like what we built?” you wonder. “What if they hate it?”
Take a moment to imagine people testing an application you designed. Test participants flow through your application , link by link and screen by screen. You start to think, “Hey, this testing thing isn’t so bad.” Then it happens.
A tester clicks a link. He pauses for a moment. He clicks the back button. He tries another button. He tries again. He gets lost. He gets frustrated. You watch your design take on damage, as he shoots barrages of criticisms and vents his anger into open space. “Raise the shields,” you scream. Alarms blare. Fires burn. Sparks shoot across the room as wires dangle from the rafters. Soon after, your once-promising application floats lifeless, scorched and battered, surrounded by a debris field of scribbled Post-It notes and haphazard observations.
Of course, that scenario is fictitious . Testing is far less dramatic and far more practical than many believe. More often than not, testers blame themselves for failures more than they blame the software that they are testing. They feel incapable—sometimes even stupid. They direct their frustration inward, not at you. As software creators, we should never fear testers; we should only feel empathy for them. They experience moments of failure so that we may design moments of success.
User testing strives for discovery, not destruction. We discover the hidden weaknesses and unforeseen strengths of software: the stuff we do not otherwise notice as captains of our own creations.
Rather than the high-stakes drama of the previous testing scenario, testing tends to go much more like this. You sit in a room. You greet a participant as she walks in. “Thanks for coming in today,” you say. “Sure, I hope I can help,” she replies. You ask her to complete a task, such as buy an airline ticket online. She does her best. You record your observations. After a few minutes, her smile transforms into pursed lips. She lets out a small, “hmm.” Your ears perk up and your eyes widen, as you take notice of her mouse pointer floating across her screen. She searches and clicks. She searches and clicks again . The hmm becomes a “hmmpf!” She is lost. You wait a few seconds and ask, “How do you think you’d get back to the previous screen?” That is about as dramatic as it gets. No alarms. No fires. At most, you see a few sparks.
Qualitative and Quantitative Testing
Let’s start with a testing method you can use today: qualitative testing . You can run a qualitative test at any time during a project. It’s quick. It’s painless. It’s helpful.
Please take a moment and read the following paragraph aloud. Whisper it to yourself if you wish. Ready, go!
Testing is quick, painless, and helpful. I’m participating in a test right now.
If you read this line aloud, we just ran a qualitative test together, albeit a small one. I asked you to do something, and you attempted to do so. Qualitative testing shares similarities with surveying. In surveying, we ask people questions. In testing, we ask people to perform tasks. Qualitative testing offers insights based on what you observe when participants perform those tasks. For example, you might ask a participant to locate information about NASA’s Curiosity Mars rover mission . You notice that she first visits Google and searches for “NASA Mars.” We ask her why she chose that phrase. She replies, “I recall hearing about a NASA and Mars website.” She clicks the first link listed in the search results “ mars.nasa.gov .” After a few moments, she scrolls down the page, pauses to review it, and then finds a tout for “Looking for Curiosity?” We ask her about why she is pausing, and she tells us she was looking for the word “Curiosity.”
On the surface, such a test reveals little insight; however, it may indicate the future behaviors of other users. The participant recalled hearing about a similar website, potentially signaling an audience’s exposure to press coverage. She clicked the first search result, possibly demonstrating which website a future user may choose . We witnessed her pausing and scanning the page for the term “Curiosity,” perhaps highlighting the importance of the term . All observations must be taken with a grain of salt, however. Qualitative tests help us understand how some users may perceive an experience, but it does not prove anything. It poses a question about each observation: “Will other people experience the same?” User testing does not provide an answer, but we should take comfort in the ubiquity of this dilemma. As the medical researcher Jonas Salk once wrote, “What people think of as the moment of discovery is really the discovery of the question.”
Quantitative methods enter the equation when you score an observation. This score can be any type of quantification , but it is frequently a success/fail or numerical tally. For example, you observe 50% of participants cannot successfully find an interface button. How can we prove others will perform in a particular way?
A complete explanation of confidence intervals, error rates, samples, and populations warrants its own book, but the short answer is that truly quantitative tests require lots of participants . To reach a 95% confidence with a ±5% margin of error , we would need to test approximately 377 randomly selected people (using a population of 20,000). That is quite an effort, and often one too daunting for a typical user test. This is why most user tests tend to be purely qualitative, or a mixture of qualitative and low confidence quantitative.
We could measure the time it takes users to complete a task. Participant A takes 2 minutes. Participant B takes 3 minutes. Participant C takes 4 minutes. Afterward, we tally the results, giving us an average of 3 minutes ([2 + 3 + 4] / 3 results = 3). The more participants we add to our test, the greater the confidence of our result . You will find that similar scoring can be used to determine the average of all sorts of numerical measurements. However, always be wary of making decisions based on small sample sizes alone. Augment your tests with qualitative questions to help bolster or disprove its claims.
Remote Testing
I am a remote testing convert. The idea of remote testing seemed absurd to me at first. How can you run a test without being in the room with the participant? How could you gauge the participant’s attitude , emotional state, or comfort level? Then I ran a few remote tests. I remained unconvinced until I heard a baby cry.
So, what does such a remote test tell us? For one, it tells us that whatever testing environment we set up in a laboratory will be light years ahead of what most participants have at home. It is a sobering realization that while software creators tend to have fast processors, high-resolution screens, and the latest OS updates , a sizable proportion of Americans do not. If your software is for home use, there is no better place to test software than on a participant’s home computer. Perhaps most importantly, participants often feel more comfortable in their own homes than they do in a lab. They pause. They tend to their kids. They answer phone calls . They use your software in the context of their own messy lives, not in the context of your organized lab.
Still, face-to-face interaction will always have its place in user testing. With the increased need to test gestures on mobile devices, it is helpful to see both participants and what they are testing. He or she may hold her phone with one hand and swipe with the other. They may turn their tablets from portrait to landscape and rest them on their knees . Someone may be vision impaired or hard of hearing—all things perhaps best suited to test in a controlled environment. Only you can determine when face-to-face or remote testing is preferable. Regardless of which you choose, you will still find value in the discoveries, sights, and sounds revealed when testing your work.
User Testing: The Final Frontier
You can test prototypes, visual design, wireframes, cocktail napkin sketches, behaviors, nomenclature, and sentiment—in fact, you can test almost anything. The border between ignorance and evidence is far-reaching but easily crossed. Only one obstacle blocks our way.
Although financial cost may occasionally be a barrier to user testing, the real impediment is fear. Fear that testing displaces prerogative. Fear that testing exposes ineptitude. Fear that testing threatens creativity. None of which proves true. Prerogative, ineptitude, and creativity remain whether you test or not. Testing allows you to discover the strengths and weaknesses of software—not of the people who created it. Once we accept this fact, we lower our shields . We seek out new knowledge and new challenges and boldly go where many others will.
Key Takeaways
Testers often blame themselves for failures.
User testing strives for discovery, not destruction.
User tests may not be representative of larger populations.
Qualitative users tests do not prove anything.
User tests tend to be purely qualitative, or a mixture of qualitative and low confidence quantitative.
If your software is for home use, there is no better place to test software than on a participant’s home computer .
We can test almost anything.
Testing discovers the strengths and weaknesses of software—not of the people who created it.
Questions to Ask Yourself
Where is the best location to conduct the test?
What hardware and software do test participants typically use?
What tasks were completed successfully or abandoned by users?
Where did users struggle the least and the most?
What nonverbal cues did participants (e.g., fidgeting in chair, looking around room, or wincing at screen) make during the test?
How long did users take to complete a task ?
What interruptions happened to the user during the test (e.g., attended to a child, dealt with a computer problem, or answered a client call)?
How can I include my team in user testing observations?
How can I alleviate my team’s fears about user testing?